N-grams based feature selection and text representation for Chinese Text Classification
نویسندگان
چکیده
منابع مشابه
N-grams based feature selection and text representation for Chinese Text Classification
In this paper, text representation and feature selection strategies for Chinese text classification based on n-grams are discussed. Two steps feature selection strategy is proposed which combines the preprocess within classes with the feature selection among classes. Four different feature selection methods and three text representation weights are compared by exhaustive experiments. Both C-SVC...
متن کاملFeature Selection on Chinese Text Classification Using Character N-Grams
In this paper, we perform Chinese text classification using n-gram text representation on TanCorp which is a new large corpus special for Chinese text classification more than 14,000 texts divided into 12 classes. We use different n-gram feature (1-, 2-grams or 1-, 2-, 3-grams) to represent documents. Different feature weights (absolute text frequency, relative text frequency, absolute n-gram f...
متن کاملCluster Based Symbolic Representation and Feature Selection for Text Classification
In this paper, we propose a new method of representing documents based on clustering of term frequency vectors. For each class of documents we propose to create multiple clusters to preserve the intraclass variations. Term frequency vectors of each cluster are used to form a symbolic representation by the use of interval valued features. Subsequently we propose a novel symbolic method for featu...
متن کاملFeature Selection and Representation in Text Classification
Text classification remains an important practical application of both modern machine learning (ML) and natural language processing (NLP) techniques. The influence of these disparate areas of research has contributed much to the success of current state of the art classification methods. This essay provides an overview of the field of text classification, and investigates in particular the topi...
متن کاملTwo-step Feature Selection Algorithm Based on N-gram Representation in Chinese Text Classification
Usually, there are two steps in the construction of an automated text classification system. The first step is that the texts are coded into a representation more suitable for the learning algorithm. There are various ways of representing a text such as by using word fragments, words, phrases, meanings, and concepts [82]. Different text representations have different dependence on the language ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computational Intelligence Systems
سال: 2009
ISSN: 1875-6891,1875-6883
DOI: 10.1080/18756891.2009.9727668